Predicting the presence of infectious virus from PCR data: A meta-analysis of SARS-CoV-2 in non-human primates

Researchers and clinicians often rely on molecular assays like PCR to identify and monitor viral infections, instead of the resource-prohibitive gold standard of viral culture. However, it remains unclear when (if ever) PCR measurements of viral load are reliable indicators of replicating or infectious virus. The recent popularity of PCR protocols targeting subgenomic RNA for SARS-CoV-2 has caused further confusion, as the relationships between subgenomic RNA and standard total RNA assays are incompletely characterized and opinions differ on which RNA type better predicts culture outcomes. Here, we explore these issues by comparing total RNA, subgenomic RNA, and viral culture results from 24 studies of SARS-CoV-2 in non-human primates (including 2167 samples from 174 individuals) using custom-developed Bayesian statistical models. On out-of-sample data, our best models predict subgenomic RNA positivity from total RNA data with 91% accuracy, and they predict culture positivity with 85% accuracy. Further analyses of individual time series indicate that many apparent prediction errors may arise from issues with assay sensitivity or sample processing, suggesting true accuracy may be higher than these estimates. Total RNA and subgenomic RNA showed equivalent performance as predictors of culture positivity. Multiple cofactors (including exposure conditions, host traits, and assay protocols) influence culture predictions, yielding insights into biological and methodological sources of variation in assay outcomes–and indicating that no single threshold value applies across study designs. We also show that our model can accurately predict when an individual is no longer infectious, illustrating the potential for future models trained on human data to guide clinical decisions on case isolation. Our work shows that meta-analysis of in vivo data can overcome longstanding challenges arising from limited sample sizes and can yield robust insights beyond those attainable from individual studies. Our analytical pipeline offers a framework to develop similar predictive tools in other virus-host systems, including models trained on human data, which could support laboratory analyses, medical decisions, and public health guidelines.

Of those, the following were immediately excluded: (i) articles published before 2020, (ii) article types that do not generate primary data (e.g., opinions, reviews), and (iii) articles with clearly irrelevant titles based on our predefined eligibility/inclusion criteria (described in the Methods).CS inspected the abstracts of the remaining 275 studies and the full texts of 122 records, according to the same eligibility criteria.

Data collection process
For each included article and every infected individual, the following study design details were obtained from the text, figures, supplementary files, or the corresponding authors, when available: primate species, rhesus macaque origin (Chinese or Indian), ID, sex, age (years or months), age class, treatment group, viral inoculum strain, inoculation route(s), and route specific inoculation dose(s).For every sample, the following information was recorded when available: sample value (converted to log10, when quantitative and reported otherwise), sample time (day post infection, starting at 0), sample type (e.g., swab, tissue), sample location (e.g., liver), sample units (e.g., viral RNA copies/mL), method of quantification (e.g., RT-qPCR, plaque assay), target gene (for PCR), cell line (e.g., VeroE6, for viral culture), and limit of detection and/or quantification.We standardized all ID names as follows: [the first and last initial of the study's first author] _ [the ID name as assigned in the original study, if available].
When raw data was not published or methodological/biological details were missing or inconsistently reported, we contacted the corresponding author(s) using a standardized email template (paired with studyspecific questions) to resolve discrepancies and/or request raw data.When clarification and/or raw data were obtained, the details were updated accordingly.In instances where clarification was not obtained but conflicting information existed in the article, we recorded the information most consistent with the methods section.

Standardizing age class and inoculation doses
We developed a consistent method to assign age classes for all studies, following methods used in prior studies (1)(2)(3)(4)(5).When age was reported in months and years, juvenile rhesus and cynomolgus macaques include individuals with ages <5 years, adults include 5-19 years, and geriatrics include ≥20 years.Juvenile African green monkeys include ages <5 years, adults include 5-14 years, and geriatrics include ≥15 years.When reported age ranges spanned multiple assignments, we used the reported age class if available, otherwise we considered it unknown.
Since inoculation doses are frequently reported as either plaque forming units (pfu) or tissue culture infectious dose 50 (TCID50), we converted all inoculation doses reported in TCID50 to pfu using a standard conversion factor (1 TCID50=0.69pfu) (6).

Confirming RNA types
To confirm whether articles quantified full-length genomic RNA, subgenomic RNA, or total RNA (since reporting practices varied across studies), all relevant primer and probe sequences were extracted from each study or from referenced articles.We used SnapGene software (from Insightful Science; available at snapgene.com) to map all sequences to the SARS-CoV-2/human/CHN/Wu-1 reference genome.Any assays with both forward and reverse primers located within the ORF1ab gene were considered to amplify full-length genomic RNA, as this gene is not located on any canonical subgenomic RNAs.Assays with both forward and reverse primers located within any individual gene downstream of ORF1ab were considered to amplify total RNA, as these sequences can be found on both subgenomic and full-length genomic RNA.Assays where the forward primer was located in the 5' UTR but the reverse primer was located within a gene downstream of ORF1ab were classified as amplifying subgenomic RNA.

Justification and prior description for candidate predictors
Below, we provide further justification for the selection of all candidate predictors, including descriptions of all assigned priors.For the sgRNA models (that predict sgRNA results from totRNA), we use g/d to signify parameters for the logistic component and a/b for the linear component.For the culture model (that predicts culture positivity from totRNA quantities), we use g/y to signify the relevant parameters.

Primary predictors:
We required all sgRNA models to include total RNA copy number (T) as a predictor, given this is the primary effect of interest.We expect the probability of detecting sgRNA to increase as total RNA copy numbers increase, which is supported by studies that find more sgRNA positive samples for those with large quantities of total RNA (7,8).Given that we expect this relationship to be positive, we assign the following prior: dT~N(2,1).We also expect sgRNA copy numbers to increase with total RNA copy numbers, as observed in other studies that find positive linear relationships between them (8-10).We assign the following prior: bT~Gamma(2,0.5).We evaluated culture models including total RNA, sgRNA (SG), or both as primary predictors.We expect the likelihood of detecting infectious virus to increase with increasing quantities of either RNA type, so we set the priors to be: ΨT,ΨSG~N(2, 1).

Age, sex, and non-human primate species:
As with other viruses, age (AGE) and sex (SEX) are hypothesized to affect individual responses to SARS-CoV-2 infection, including within-host infection kinetics and disease severity.Inter-species variability in infection dynamics has also been noted among non-human primate species (SP) and other animal models (11).These differences may affect observed relationships between assays.However, given the complexity of these biological interactions, we do not assign a priori expectations about the direction of these effects, and we set all associated priors to be N(0,1).
Inoculation dose: Since sgRNA is not typically packaged into virions (12) and thus should not meaningfully exist in viral stocks, we expect to find detectable levels of total RNA earlier than sgRNA after inoculation (i.e., lower probabilities of sgRNA detection per total RNA quantity for higher doses; dDOSE~N(-1,0.5)).Larger inoculum doses could also result in (at least initially) higher levels of total RNA relative to sgRNA (bDOSE~N(-0.25,1)).Exposure dose affects virion quantity and hence also total RNA quantity, which in turn may affect the probability of culture positivity through the presence of residual inoculum or other effects on infection kinetics.We make no a priori predictions on the direction of this effect (ΨDOSE~N(0,1)).
Day post infection: sgRNA levels likely remain low for some time in recently infected tissues.They may rise as virions infect local cells and replicate, and the ratio between total RNA and sgRNA may stabilize (8).
Sample type: Differences in the processing and content of non-invasive (e.g., swabs, biofluids) and invasive (e.g., whole tissues obtained at necropsy) samples may affect assay readouts, and so we include sample type as a candidate predictor.We make no a priori predictions on their differences, and instead we assign noninformative priors (dST,bST,ΨST~N(0,1)).

Culture assay:
The sensitivity of endpoint dilution (TCID50) and plaque assays can also vary, with plaque assays typically having lower sensitivity (18).We distinguish between these protocols in a binary predictor (ASSAY), with endpoint dilution treated as the reference.We assign the following prior (ΨASSAY ~N(-0.5,1)).

Article and lab effects:
We assigned the same non-informative priors for the mean and standard deviation of each article's error term in the linear component (σ " ~ N(0, 1); ssd ~ Exp( 1)).We also assigned the same noninformative priors for all lab effect terms (N(0, 0.5)).

Approximate leave-one-out cross-validation
Beyond standard 10-fold cross-validation used in our main analyses, we also evaluated model performance and conducted model selection using Pareto-Smoothed Importance Sampling Approximate Leave-One-Out cross-validation (PSIS-LOO) via the 'loo' R package (19).We ran this method for the linear component of the sgRNA model using RStan version 2.21.0 to use additional functionality of the LOO software (20).All other models were run using CmdStan, as described in the main Methods.
Pareto-k values indicate the accuracy of PSIS-LOO approximations, where values below 0.7 indicate sufficiently reliable estimates.We use the moment-matching functionality offered by the 'loo' package to correct Pareto-k values exceeding 0.7, after which all PSIS-LOO estimates generated in this study were reliable (adjusted Pareto-k < 0.7).

Cross-validation methods and model evaluation metrics
For 10-fold cross-validation, we assigned folds randomly, although we required each fold to contain similar quantities of data from each article.This allowed us to distribute protocols, demographics, and sampling types more evenly across folds.All statistics for this method were calculated separately for training and test sets to evaluate performance on out-of-sample data and assess potential bias or overfitting, with the exception of ELPD and MCC which were only calculated for test sets.
As outlined in the model description of the main Methods, the linear component of the sgRNA models contained article-specific hierarchical error rates.We used the estimates of each article's specific error distribution to generate ELPD values.However, when we calculated median absolute error and the percent of samples falling within given prediction intervals, we did not incorporate article-specific errors.Instead, for each iteration, we sampled the full range of estimated errors by sampling the error distribution of a random included article (uniformly distributed), so these statistics better correspond to performance on new data (e.g., new studies) where prior information on error distributions is unavailable.

Selection procedure and results for the best sgRNA model
In the paragraphs below, we describe our model selection procedure for the sgRNA model, which relied on various performance metrics for each component.Overall, this procedure clearly identified the best model for both the logistic and linear components of the sgRNA model (  ).
For the logistic component, the best model performed substantially better than the simple model for all three statistics considered.The best model had considerably higher ELPD (difference: 67.  ).The best model also correctly predicted sgRNA detectability with higher probability for more samples and had a substantially higher ELPD, both of which reflect higher certainty for true classifications (Fig

Selection procedure and results for the best culture model
In the paragraphs below, we describe our model selection procedure for the culture model, which relied on three primary performance metrics.This procedure clearly identified the best culture model.).

Mathematical form of the best sgRNA model
The mathematical form of the best sgRNA model is outlined below, where we use g/d to signify parameters for the logistic component and a/b for the linear component.Acronyms are as follows: SG: sgRNA, T: total RNA, DOSE: inoculation dose (log10 pfu), SP: species, TG: target gene (standardized to four levels), and DPI: day post infection.

Prediction intervals and parameter estimates
To generate prediction intervals for various statistics, including median probabilities of sgRNA or culture positivity and median predicted sgRNA copy numbers, we used all available post-warmup parameter samples from the best model.We used the same procedure to generate the fit lines shown in Figures 3 and 5.Each prediction is generated using grouped parameter samples (e.g., samples from the same chain and iteration) to preserve correlation structure.

Estimating differences between predicted and known outcomes
To compare median predicted sgRNA copies to observed sgRNA or totRNA copies, we modeled the distribution of each of these quantities as normally distributed random variables with unique means and variances.

Fig 2 ).
Parallel analyses conducted with Pareto-Smoothed Importance Sampling approximate leave-one-out cross-validation resulted in qualitatively similar outcomes and selection of the same model (S2, S3 Table).Sensitivity analyses confirmed qualitatively similar results between informative and non-informative priors for the model selection procedure and for the parameter estimates of the best model (S12 Fig).Prediction accuracy and median absolute error (MAE) showed minimal differences between training and test datasets, for all PCR models considered, offering confidence in the models' generalizability (S2, S3 Table

2 ,
3C).Prediction accuracy and Matthews correlation coefficient (MCC) indicate nearly identical performance of the best and full models (Fig 2, S2 Table).The difference between the estimated log pointwise predictive density (ELPD) for the best and full models falls within two standard errors of the difference (difference: 1.60; standard error: 3.50), again indicating similar model performance (21).When predicting copy numbers for all sgRNA positive samples with the linear component, the best model showed better performance than the full and simple models on all three statistics considered.Predictive performance of the best model is substantially better than the simple model, with 55.0 versus 48.0 percent of samples falling within the 50 percent prediction interval (Fig 2, S3 Table) and a decrease in median prediction error from 0.58 to 0.44 (Fig 3F).The ELPD difference is also substantial (-186.2).Posterior predictive checks show a strong correlation between observed and median predicted sgRNA values for the best model (adjusted R 2 =0.77), which further supports superior performance of the multiple regression model compared to the simple single regression model (adjusted R 2 =0.68).Relative to the full model, the best model has higher ELPD (difference: -4.44) and similar prediction error on test data (0.44 vs. 0.45) (Fig 2, S3 Table).A higher percent of (test) samples fall within the model-generated 50 percent prediction interval for the best model (55.0 vs. 54.5).